Computationally Easy Outlier Detection via Projection Pursuit with Finitely Many Directions
نویسندگان
چکیده
Outlier detection methods are fundamental to all of data analysis. They are desirably robust, affine invariant, and computationally easy in any dimension. The powerful projection pursuit approach yields the “projection outlyingness”, which is affine invariant and highly robust and does not impose ellipsoidal contours like the Mahalanobis distance approach. However, it is highly computationally intensive, being obtained by taking suprema of univariate scaled deviation outlyingness over all projections of the data onto lines. Here we introduce several outlyingness functions based on a vector of scaled deviations taken over only finitely many directions approximately uniform over the unit hypersphere. A preliminary transformation of the data to a strong invariant coordinate system makes such vectors affine invariant. We establish useful foundational theory for finite vectors of scaled deviations on projections. Also, using artificial and real data sets, we compare our affine invariant outlyingness functions with the usual projection outlyingness and with robust Mahalanobis distance outlyingness. AMS 2000 Subject Classification: Primary 62H99. Secondary 62G99
منابع مشابه
Outlier Detection in Multivariate Time Series via Projection Pursuit
This article uses Projection Pursuit methods to develop a procedure for detecting outliers in a multivariate time series. We show that testing for outliers in some projection directions could be more powerful than testing the multivariate series directly. The optimal directions for detecting outliers are found by numerical optimization of the kurtosis coefficient of the projected series. We pro...
متن کاملOutlier Detection in Multivariate Time Series by Projection Pursuit
In this article we use projection pursuit methods to develop a procedure for detecting outliers in a multivariate time series. We show that testing for outliers in some projection directions can be more powerful than testing the multivariate series directly. The optimal directions for detecting outliers are found by numerical optimization of the kurtosis coefficient of the projected series. We ...
متن کاملApproximate Document Outlier Detection Using Random Spectral Projection
Outlier detection is an important process for text document collections, but as the collection grows, the detection process becomes a computationally expensive task. Random projection has shown to provide a good fast approximation of sparse data, such as document vectors, for outlier detection. The random samples of Fourier and cosine spectrum have shown to provide good approximations of sparse...
متن کاملNonparametric Estimation of Nonlinear Money Demand Cointegration Equation by Projection Pursuit Methods
Money demand equation continues to attract attention of econometricians with a new wrinkle provided by cointegration. We use projection pursuit (PP) regressions pioneered by Friedman and Stuetzle (1981) to suggest new estimates of partials of conditional expectations of the regressands with respect to the regressors and prove their consistency. Since the usual cointegration methodology involves...
متن کاملApplication of Recursive Least Squares to Efficient Blunder Detection in Linear Models
In many geodetic applications a large number of observations are being measured to estimate the unknown parameters. The unbiasedness property of the estimated parameters is only ensured if there is no bias (e.g. systematic effect) or falsifying observations, which are also known as outliers. One of the most important steps towards obtaining a coherent analysis for the parameter estimation is th...
متن کامل